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We studied how the inhomogeneity of a sequence affects the phase transition that 
takes place at DNA melting. Unlike previous works, which considered thermody- 
namic quantities averaged over many different inhomogeneous sequences, we focused 
on precise sequences and investigated the succession of local openings that lead to 
their dissociation. For this purpose, we performed Transfer Integral type calculations 
with two different dynamical models, namely the heterogeneous Dauxois-Peyrard- 
Bishop model and the model based on finite stacking enthalpies we recently pro- 
posed. It appears that, for both models, the essential effect of heterogeneity is to 
let different portions of the investigated sequences open at slightly different tem- 
peratures. Besides this macroscopic effect, the local aperture of each portion indeed 
turns out to be very similar to that of a homogeneous sequence with the same length. 
Rounding of each local opening transition is therefore merely a size effect. For the 
Dauxois-Peyrard-Bishop model, sequences with a few thousands base pairs are still 
far from the thermodynamic limit, so that it is inappropriate, for this model, to 
discuss the order of the transition associated with each local opening. In contrast, 
sequences with several hundreds to a few thousands base pairs are pretty close to the 
thermodynamic limit for the model we proposed. The temperature interval where 
power laws holds is consequently broad enough to enable the estimation of critical 
exponents. On the basis of the few examples we investigated, it seems that, for our 
model, disorder does not necessarily induce a decrease of the order of the transition. 
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I. INTRODUCTION 



This article is the last one of a series of three papers aimed at investigating the sta- 



iistical physics of DNA denaturation, i.e. the separation of the two strands upon heating 

1 p n ft n n 

II, 12, 13|, U, 15|, 16|, on the basis of dynamical models like the Dauxois-Peyrard-Bishop one 
, [7, 0] and models we recently proposed to take the finiteness of stacking interactions 



explicitly into account 
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12l |. we analyzed the denat- 



In the first article of the series 
uration of homogeneous sequences at the thermodynamic limit of infinitely long chains. We 
calculated the six fundamental exponents which characterize the critical behaviour of the 



specific heat, the order 



par ameter, the correlation length, etc., by using the Transfer Inte- 



gral (TI) technique |13L Il4|. We showed that for the two investigated models the exponent 
for the specific heat is significantly larger than 1, which indicates that, within the validity 
of these models, denaturation is a first order phase transition. We also checked the validity 
of the four scaling laws which connect the six exponents and observed that Rushbrooke and 
Widom identities are satisfied, but not Josephson and Fisher ones. While the invalidation 
of Fisher identity is without any doubt a consequence of the dimensionality d = 1 of the 
investigated models, we argued that the failure of Josephson identity may well be due to 
the divergence of the order parameter, i.e. the average separation between paired bases. 



The purpose of the second article of the series [15| was to describe how the finite length 
of real sequences affects their critical properties. We characterized in some detail the three 
effects that are observed when the length of homogeneous sequences is decreased, namely, 
the decrease of the critical temperature, the decrease of the peak values of all quantities (like 
the specific heat and the correlation length) that diverge at the thermodynamic limit but 
remain finite for finite sequences, and the broadening of the temperature range over which 
the critical point affects the dynamics of the system. We furthermore performed a finite size 
scaling analysis of the models and showed that the singular part of the free energy can indeed 
be expressed in terms of a homogeneous function. We however pointed out that, because of 
the invalidation of Josephson identity, the derivation of the characteristic exponents which 
appear in the expression of the specific heat requires some care. 

The investigations performed so far [l^, [isl therefore dealt with homogeneous sequences. 
The reason is that homogeneous sequences display only one phase transition, that is, the 
whole sequence opens at a single well-characterized temperature on which theoretical inves- 
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tigations can focus. In contrast, the examination of UV absorption spectra revealed a long 
time ago that the denaturation of sufficiently long inhomogeneous sequences occurs through 
a series of local openings when temperature is increased 16|], which makes this problem 
substantially more difficult to analyze. However, since all real DNA molecules display a 
heterogeneous, almost random-looking, distribution of A, T, G and C base pairs, the statis- 
tical physics description of the denaturation of such inhomogeneous sequences appears as a 
necessity. 

In the language of statistical physics, a heterogeneous distribution of the individual com- 
ponents constituting a complex system is called disorder. One distinguishes field disorder, 
where heterogeneity concerns the distribution of the external field coupled to every compo- 
nent of the system, from bond disorder, which accounts for a heterogeneous distribution of 
the interactions between the elementary components of the system. No external field is con- 
sidered in the present paper, which therefore focuses on bond disorder. The consequences of 
the introduction of disorder in a homogeneous sy stem which displays a second order phase 
transition have been characterized by Harris [17]. According to Harris criterion, disorder 
does not affect the critical behaviour of the homogeneous system if the correlation length 
critical exponent v fulfills the inequality v > 2/d, where d is the dimensionality of the system, 
because this implies that the correlation length is large enough to smear out heterogeneities 
close to the critical point. If Harris criterion is instead violated, then a new critical point 
generally sets in. The exponents of the power laws that are observed in the neighborhood of 
this new critical point satisfy Harris criterion. Harris work was extended a few years later 
by Imry and Wortis 18| to systems with a sharp first order phase transition at the homoge- 
neous limit. On the basis of a heuristic argument, Imry and Wortis suggested that all first 
order transitions of homogeneous systems could well be rounded and transformed to second 
order transitions upon introduction of disorder, except if the dimensionality of the system 
is larger than a certain critical dimensionality dc and its correlation length sufficiently large. 
Note, however, that Imry and Wortis' argumentation explicitly assumes a finite correlation 
length at the critical temperature, while DNA melting corresponds to a somewhat peculiar 
: irst order phase transition with diverging correlation length. More recently, Hui and Berker 
lol . I20I . and Aizenman and Wehr 21 1, showed on the basis of general arguments that if a 
temperature-driven first order phase transition involves a symmetry breaking, then it con- 
verts to a second order phase transition upon introduction of disorder. Otherwise, i.e. if the 
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critical point involves no symmetry breaking, then it is simply eliminated by disorder. Since 
DNA denaturation, as described by the models we investigate, does not involve symmetry 
breaking, this would imply that the denaturation of heterogeneous DNA sequences is neither 
associated with a phase transition nor a succession thereof. 

Beside these general theoretical investigations, the question of the introduction of disor- 
der in DNA sequences has been the subject of recent simulations 22, l23|, |2J, |25|, l26||, which 
dealt with models inspired from the Poland-Scheraga one in the regime where the pure 
model displays a first-order transition, i.e. for a loop exponent c = 2.15 > 2. These studies 



lead to contradictory interpretations. Garel and Monthus [22|, l23[ indeed concluded that the 
transition remains first order in the disordered case, while Coluzzi and Yeramian 24, 3, 26 1 
instead expressed the opinion that the random system undergoes a second order transition. 
It should be emphasized that these studies considered disorder- averaged thermodynamic 
observables and agreed on the point that these observables are not self-averaging at critical 
points, essentially because of the distribution of pseudo-critical temperatures over the en- 



semble of samples 22|, |26l]. In the present work, we will tackle a different question : is it 



sensible to describe the succession of local openings, which take place when the temperature 
of a precise heterogeneous sequence is increased, as a series of local phase transitions and, 
eventually, to specify the order of the local transitions ? 

The remainder of this paper is organized as follows. The dynamical models whose physical 
statistics we investigate are briefly described in Sec. II for the sake of completeness. We 
next derive in Sec. Ill the TI formulae which enable the calculation of the thermodynamic 
properties of finite heterogeneous sequences. We discuss in Sec. IV the critical behaviour 
of the specific heat per particle, cy = Cy/N, the average bubble depths, and the 

correlation length, ^, before concluding in Sec. V. 



II. NONLINEAR HAMILTONIAN MODELS FOR INHOMOGENEOUS DNA 

SEQUENCES 



The Hamiltonians of the two DNA models whose critical behavior is studied in this paper 
are of the form 
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^ = E ^ + "^MiVn) + W^("n2/n,yn-l) , (l) 
n=l ^ ^ 

where ?/„ is the transverse stretching at the rath pair of bases, V^"''(yn) describes the energy 
that binds the two bases of pair n, and W^'^\yn-iyn-i) stands for the stacking interaction 
between base pairs n — 1 and n. The superscripts (n) in these terms indicate that both the 
on-site and the stacking interactions may be site-dependent for inhomogeneous sequences. 
The two models agree in representing the interbase bond V^"'*(l/n) by Morse potentials but 
the expressions for the stacking interactions are rather different. Moreover, the heteroge- 
neous Dauxois-Peyrard-Bishop (DPB) model assumes that heterogeneity is essentially 



carriec 
posed 



by different Morse parameters for AT and GC base pairs, while the models we pro- 
10, lUl are based, like thermodynamic ones [9| , on a set of ten different finite stacking 



enthalpies Aif„ corresponding to all possible oriented successions of base pairs. More pre- 



cisely, for the heterogeneous DPB model 



(2) 



while for our model |lOl], hereafter called the JB model, 

V^\y^)=VM{yn)=D{l-e-'^y-y 



(3) 



The nonlinear stacking interaction in Eq. ([2]) has the particularity of having a coupling 
constant which drops from K{1 + p) to K as the paired bases separate. This decreases the 
rigidity of DNA sequences close to dissociation and results in a sharp first-order transition 
3]. The first term in the expression of Vr*-"^(?/„, yn-i) in Eq- © describes the finite stacking 
interaction and the second one the stiffness of the phosphate/sugar backbone. Introduction 
of finite stacking enthalpies AiJ„ in the model is by itself sufficient to ensure a first-order 
denaturation transition jlol |. 

We used two sets of numerical values for the DBP Hamiltonian. For the calculation of the 
melting profiles discussed in Sec. Ill, we used the set of parameters of Zhang et al. that 
is, Dn = 0.038 eV for AT base pairs, Dn = 0.042 eV for GC base pairs, a„ = 4.2 A ^ for both 



6 



AT and GC base pairs, K = 0.042 eV A ^, p = 0.5, and a = 0.35 A ^ . For the discussion of 
the specific heat critical exponent in Sec. IV, we instead used values that coincide, except 
for the Dn, with those we used in our work on critical exponents |l2|. More explicitely, 
Dn = 0.027 eV for AT base pairs, = 0.033 eV for GC base pairs, a„ = 4.5 A ^ for both 

o —2 9 —1 

AT and GC base pairs, K = 0.06 eV A , p = 1.0, and a = 0.35 A . The ten values of 

CI 

the stacking enthalpies AHn of the JB model were taken from Table 1 of Ref. [9| and the 



id], that is, D = 0.04 eV, a = 4.45 A \ 



other parameters of this model are those of Ref. 
Kb = 10"^ eV A" and b = 0.10 A" . Finally, the reduced mass of each base pair was 
considered to be m = 300 uma in the molecular dynamics simulations reported in the next 
section. 



III. TI CALCULATIONS FOR INHOMOGENEOUS DNA SEQUENCES 

When ignoring the dissociation equilibrium 5*2 ^ 25", which properly governs the sep- 
aration of the two complementary strands (5*) when the last base pair of double-stranded 
DNA (S'2) opens 0, O, js, [l^, and neglecting the trivial term arising from kinetic energy, 
the partition function for the DNA models of Eq. ([1]) with open boundary conditions can 
be expressed as 



where (3 = {ksT)'^ is the inverse temperature. The TI method fisl . \\\ is a technique 
that allows for the efficient computation of Z. While this method was originally developed 
to investigate homogeneous sequences at the thermodynamic limit of infinitely long chains 
Zhang et al. 14] have shown how it can be adapted to finite sequences described by 
the heterogeneous DPB model. It turns out that, because of the symmetric form of the 
Hamiltonian for the JB model, TI calculations are quite simpler for this model than for the 
DPB one. In this section, we first indicate the successive steps for calculating the partition 
function of the JB model and, consequently, its free energy, entropy, and specific heat. We 
next derive expressions for two-point correlation functions. 

The first step for calculating Z consists in rewriting Eq. (jlj) in the form 
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Z = j dy.dy^--- dy^e-f'^'^'^^'^/'K, (y,, y,) {y,, y,) ■ ■ ■ {y^, y^.,) e'P'^Mku.M^ ^ (5) 
where the TI kernel Kn{y-, x) for base pair n interacting with base pair n — 1 has the form 



Kniy,x) = exp 



X] 



W^-\y, 



x\ 



(6) 



For the DPB model, Kn{y, x) is not symmetric {Kn{y, x) 7^ Kn{x, y)) when base pairs n and 



n 



1 are different. Zhang et al. 



14l |. who used the DBF model, therefore had to develop a 



symmetrization procedure that makes the all scheme more complex. In contrast, Kn{y,x) is 
symmetric {Kn{y,x) = Kn{x,y)) for the JB model, whatever the base pairs at positions n 
and n — 1, so that no additional symmetrization procedure is required. For the JB model, the 
essential difference between the procedures for homogeneous and inhomogeneous sequences 
consequently arises from the fact that ten different kernels need be considered, one for each 
possible succession of two base pairs [9| . The trick borrowed from method 2 of Zhang et al. 



ij] then consists in developing each kernel in a different orthonormal basis 



x] 



(7) 



where the and {A^^} are the eigenvalues and eigenvectors of the TI operator and 

satisfy the equation 



dxKn{x,y)^ 



(8) 



By defining 



B'ij = \J \ O-i 0,j 

and substituing the kernel expansion of Eq. ([7j) into Eq. ([5]), the partition function can be 
rewritten in the form 
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or, equivalent ly, 



Z = Tr {BD(^)d(^) ■ ■ ■ d(n-i)d(n) } , (11) 

where B stands for the matrix with elements Bij, D*^"-* for the matrix with elements -D|J\ 
and Tr indicates that one must take the trace of the product of matrices. Finally, the free 
energy F, the entropy S, and the specific heat Cy are obtained from Z according to 



F = -keTlniZ) 

dF 
df 



dF 

S = -7^ (12) 



d^F 

Calculation of intensive thermodynamical functions proceeds only similar lines. For example, 
the mean elongation of the n'th base pair can be written in the form 

iVn) = \jdy,dy2--- dyNyne-^''^^''^'^K2 (1/2, Vi) K, {y,, y^) ■ ■ ■ Kr, (y^, yN-i) e'^^-^^-^/^ 

(13) 

Defining 



6« = j dye-^^-^yy^^f\y)y 
bf^ = J rfi/e-^^-(^)/2<l>f^(i/)2/ 

= ^>i^a^^bf (14) 

= y^pAp I dy<i>t'\y)<^f{y)y, 

and substituting Eq. ([7]) into Eq. f|T3l) . the mean elongation is obtained in the form 

= Irr |bd(^)dW ■ • ■ D(")Yi"+')D("+2) ■ ■ ■ D^^)} (15) 
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for n 7^ 1 and n ^ N, and 

(^^) = Irr {C(")d(3)dW ■ ■ ■ d(n-i)d(n)} (16) 

Zj 

at the extremities of the chain, that is, for n = \ ox n = N . 

Two-point correlation functions are derived in the same manner. One obtains 

f (17) 
if m and n are different from 1 and A^, 

f ^ ^ (18) 

Zj 

if m is different from 1 and but n is equal to 1 or TV, and 

l^y^y^) = ^Tr {e(in)d W . . . d(n-i)d(n) } . (19) 

Zj 

In Eqs. (HZD-dlSl) we noted 

Bif ^ ^^af^cf ^^^^ 

ySI = ^Ap^ I dy<!>t'\y)^^;\y)y' ■ 

In order to check the accuracy of the TI procedure, we compared melting profiles ob- 
tained with this method to those obtained from molecular dynamics (MD) simulations. MD 
simulations consist in integrating numerically Langevin equations of motion 
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= 'Wn' + V/^^^W • (21) 

The second and third term in the right-hand side of this equation model the effects of the 
solvent on the sequence. 7 is the dissipation coefficient (we assumed 7 = 5 ns~^ as in Refs. 



IG 



11 



28| ) and w{t) a normally distributed random function with zero mean value and unit 
variance. Step by step integration, with 10 fs steps, was performed by applying a second 
order Briinger-Brooks-Karplus integrator [29] to the sequence initially at equilibrium at 
K and subjected to a temperature ramp of 10 K/ns. This slow heating insures that the 
temperature of the system, estimated from its average kinetic energy 



N 



p: 

n=l 



^k^-- NkB^2m' ^^^^ 



closely follows the temperature T imposed by the random kicks. Once the required tempera- 
ture was reached, Langevin equations were integrated at constant temperature for additional 
30 ns in order to bring the system still closer to thermal equilibrium. We finally averaged 
the base pair separations i/n over time intervals which varied between 1 fis for temperatures 
substantially smaller than the melting one, up to 5 fis close to melting, in order to corre ctly 
average the low frequency thermal fluctuations which develop close to the critical point |28| . 
During the averaging process, we went on recording the physical temperature of the sys- 
tem (Eq. (122|) ). because its final agreement with the imposed temperature T provides an 
estimate of the quality of the averaging. For all the results presented below, the differences 
between the two temperatures were kept below 0.1 K. 

Figs. 1 and 2 show the melting profiles as a function of n at increasing temperatures 
for, respectively, the 1793 base pairs (bp) human /?-actin cDNA (NCB entry code NM_00110) 
and the 2399 bp inhibitor of the hepatocyte growth factor activator [3^, which were obtained 
rom TI calculations with the JB model. In contrast with the estimation of critical exponents 
1^ . [isl . this kind of plots does not require a very high precisioii, so that the grid on which 



the matrix representations of the TI kernels Kn{y, x) were built 13| consisted of only 2901 y 
values regularly spaced between ymin = — 100/a and ymax = 2800/a with steps of 1/a. The 
melting profiles for the actin sequence at 322 K and 346 K obtained from TI calculations 
and MD simulations performed with the JB model are compared in Fig. 3. It is seen 
that even tiny details coincide for the two curves at 322 K (bottom plot). The agreement 
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remains excellent closer to denaturation. In particular, both methods conclude that all base 
pairs with n > 1200 are open at this temperature. We will come back to this point later. 
Note that resolution with respect to base pair positions is, however, substantially higher 
in TI results, although TI calculations were more rapid than MD simulations by a factor 
of almost 10 close to melting. In spite of the fact that the TI procedure is much more 
CPU demanding for inhomogeneous sequences than for homogeneous ones, it therefore still 
appears as a very powerful tool compared to MD simulations. Fig. 4 compares melting 
profiles for the actin sequence at 350 K obtained from TI and MD calculations performed 
with the heterogeneous DPB model. Although the agreement is again excellent, it is seen 
that the TI profile looks like as if it consisted of 3 or 4 superposed curves. This is most 
probably due to the conjunction of two phenomena : (i) the heterogeneous DBF model 
assumes that the Morse interaction for GC base pairs is stronger than that for AT base 
pairs, and (ii) the resolution of the TI procedure is high enough to reflect the variations 
of (yn) at the level of single base pairs that result from this difference. To confirm this 
hypothesis, we checked that the same phenomenon does show up for the JB model. Still, 
since this model assumes that heterogeneity is carried by stacking interactions instead of on- 
site potentials, superposed curves essentially appear in the plots of — ?/„_i) as a function 
of n. Moreover, the phenomenon is somewhat attenuated compared to Fig. 4, because the 
JB model considers ten different stacking enthalpies, while the DFB one considers only two 
different Morse potential strengths. Finally, Fig. 5 shows the melting curve, that is, the 
evolution with temperature of the portion of open base pairs, for the 1793 bp actin sequence 
obtained with the JB model. Although they were computed with different models, this curve 



compares very well with the one drawn in Fig. 4 of Ref. |ll |. 

In conclusion, the TI procedure appears as a powerful and trustful tool for the computa- 
tion of the thermodynamic properties of inhomogeneous DNA sequences. 

IV. EFFECTS OF DISORDER CLOSE TO MELTING 



In this section, we will inve stig ate the ro 



contrast with previous studies 22 



23 



24 



25 



e o 



disorder close to the critical point. In 



26| . we will not consider disorder-averaged 



quantities, that is, we will not discuss the statistical physics of an ensemble of random se- 
quences. Instead, we will focus on precise sequences and try to determine if the successive 
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openings that lead to the dissociation of these sequences may be described as phase transi- 
tions, and eventually address the question of the order of these transitions. To this end, we 
will study the behaviour of the specific heat per particle, cy = Cy/N, the average bubble 
depths, (yn), and the correlation length, ^, close to the critical temperature. 

A. Critical behaviour of cy 

The evolution of cy with temperature was computed for the 1793 bp actin and the 2399 
bp inhibitor according to Eqs. (11) and (12). Finite differences were used to estimate the 
second derivative of Z in Eq. (12). The results obtained with grids of 2901 values of y 
regularly spaced between — 100/a and 2800/a are shown in Fig. 6. The evolution of cy in 
these plots is most easily understood when comparing them to the corresponding profiles in 
Figs. 1 and 2. The bottom plot in Fig. 1 indeed indicates that the average AT content for the 
1793 bp actin sequence is substantially higher for base pairs with n > 1150. It is seen in the 
top plot of Fig. 1, that one third of the sequence (the base pairs with n > 1150) consequently 
melts around 346-348 K, while the remaining two thirds (the base pairs with n < 1150) melt 
at the slightly higher temperature of about 354 K. This two-steps denaturation is perfectly 
reflected in the temperature evolution of cy (top plot of Fig. 6), which displays two peaks 
with 1 : 2 relative intensities centred around 348 and 354 K. For the 2399 bp inhibitor, the 
bottom plot of Fig. 2 similarly indicates that the average AT content is rather uniform in 
the sequence, except that it significantly decreases with decreasing n for the first 600 base 
pairs. Not surprisingly, it is accordingly seen in the top plot of Fig. 2 that these first 600 
base pairs melt about three degrees above the temperature of 352-354 K where the rest of 
the sequence dissociates. Since this second melting step involves only about one fourth of 
the sequence and takes place very close to the first step, it merely appears as a shoulder on 
the high temperature side of the plot of cy in the bottom plot of Fig. 6. 

In order to learn more about these openings, we next draw log-log plots of the evolution 
of Cy as a function of the reduced temperature t, defined according to 

^=1-^- (23) 

In the case of homogeneous sequences, the critical temperature that appears in Eq. (125]) 
is unambiguously defined. This is no longer the case when dealing with inhomogeneous 
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sequences, so that in the following we will explicitly state which temperature is used as 
Tc. Moreover, this kind of plot requires more precision than the previous figures. The 
calculation of Z in Eq. (|TT1) was therefore performed with grids of 4101 values of y regularly 
spaced between — 100/a and 4000/a. The result obtained for the JB model and the 2399 
bp inhibitor is displayed in the bottom plot of Fig. 7. The solid line shows the result for 
the 2399 bp inhibitor sequence, while the dashed and dot-dashed lines show results that we 
previously obtained for a 2000 bp homogeneous sequence and an infinitely long homogeneous 
sequence, respectively (see the bottom plot of Fig. 3 of Ref. isl). For the inhomogeneous 
sequence, Tc was taken as the temperature where cy is maximum (for the grid with 4201 
points, we numerically obtained Tc = 354.34 K), so that the solid line actually deals with the 
first step of the melting of the inhibitor sequence, that is, the opening of the base pairs with 



n > 600. In Ref. 15|, we arrived to the conclusion that the thermodynamics of sequences 
with a few thousands base pairs are close to that of infinite ones down to t ~ 10"^ for the 
JB model. As a consequence, the curves for the 2000 bp and infinitely long homogeneous 
sequences are almost superposed above this threshold. Stated in other words, the rounding 
of the phase transition is hardly noticeable for temperatures which differ from the critical 
one by more than a few tenths of a degree. Examination of the bottom plot of Fig. 7 
further shows that the thermodynamics of the opening of the 1800 base pairs with n > 600 
of the inhibitor sequence is also very similar to that of the finite {N = 2000) and infinite 
homogeneous sequences : rounding is indeed imperceptible about one degree (t ~ 3.10^'^) 
below the critical temperature. The power law dependence of cy against t therefore extends 
over an interval of t values which is sufficiently large to allow for the estimation of the 
critical exponent a of cy. One obtains a = 1.07, which is characteristic of a first order 
phase transition. 

The top plot of Fig. 7 also displays a log-log plot of the evolution of cy with t com- 
puted, however, with the heterogeneous DBF model. For the grid with 4201 points and 
this model, we found Tc = 284.24 K. We showed in Ref. 151] that, in contrast with the 
JB model, sequences with = 2000 bp are still far from the thermodynamic limit for the 
DBF model. Therefore, the dashed curve (homogeneous sequence with = 2000 bp) and 
the dot-dashed one (homogeneous sequence at the thermodynamic limit) are well separated. 
Examination of this plot also indicates that the (solid) curve for the inhomogeneous 2399 
bp inhibitor sequence is again qualitatively close to the (dashed) curve for the homogeneous 
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2000 bp sequence - and consequently quite separated from the curve for the sequence at the 
thermodynamic hmit. 

One might therefore tentatively conclude from the results presented in this subsection 
that, for a given sequence, the essential effect of heterogeneity is to let different portions of 
the sequence open at slightly different temperatures. Besides this global effect, the dynamics 
of the local aperture of each portion is indeed very similar to that of a homogeneous sequence 
with the same length. We will now investigate the critical behaviour of the depth of the 
bubbles and of the correlation length, in order to check whether they confirm this conclusion. 

B. Critical behaviour of the bubble depth 

As we already noted, the 1793 bp actin sequence opens in two fairly separated steps : the 
base pairs with n > 1150 melt around 348 K, while those with n < 1150 melt at the slightly 
higher temperature of 354 K (see Figs. 1, 5 and 6). Finer details can be observed in Fig. 
1. It is indeed seen that melting of the n > 1150 portion is driven by three bubbles centred 
around n — 1300, n — 1450 and n — 1610, while melting of the n < 1150 portion is driven by 
two bubbles centred around n = 318 and n = 441, the centre of each bubble corresponding 
to a local maximum of the AT percentage. Fig. 8 displays log-log plots of the average depth 
of each bubble, (?/„), as a function of the reduced temperature t, obtained with the JB model. 
For the three bubbles with n > 1150 (top plot), the critical temperature was taken as the 
temperature Tc — 348.2 K of the secondary maximum of the specific heat, while for the two 
bubbles with n < 1150 (bottom plot), the critical temperature was taken as the temperature 
Tc = 353.9 K of the principal maximum of cy. Fig. 8 indicates that (i) the average depth 
of all bubbles exhibits a power law dependence against t over a reasonably large interval of 
temperatures, (ii) the slopes are essentially identical for all bubbles belonging to the same 
portion of the sequence, and (iii) the critical exponents that can be deduced from these 
slopes, that is, -1.28 for the bubbles with n > 1150 and -1.00 for the bubbles with n < 1150, 
are close to the critical exponent /3 = —1.31 we obtained at the thermodynamic limit [12]. 



15 



C. Critical behaviour of the correlation length ^ 

At the thermodynamic hmit of infinitely long chains, the two-point spatial autocorrelation 
function 

Ci, = {y^y,)-{y^){yJ) (24) 
varies for large values of |i — j| according to 

dj (X exp{-\i - , (25) 

where ^ is the correlation length 



131 ] ■ ^ can consequently be obtained as the inverse of the 
slope in the plots of ln{Cij) as a function of \i — j\. Such plots are shown in Fig. 9 for a 
homogeneous sequence with 10000 base pairs described with the homogeneous version of the 



JB model 
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15j. It is seen that the natural logarithm of Cij indeed evolves linearly 
with j — i over more than 20 orders of magnitudes and that the correlation length ^ can 
be determined very accurately from the slope of these curves. When plotting the values 
of ^ obtained in this way as a function of t (critical temperature is Tn = 367.47 K), one 



furthermore recovers the critical exponent u = 1.23 reported in Ref. [12l]. Similar plots of 
ln{Cij) as a function of j — z, obtained from Eqs. (|T5l) . ( ITTl) . and llMl) . are reported in Fig. 10 
for the 1793 bp actin sequence described with the JB model. The main plot was obtained by 
setting i = 180 and the smaller one by setting i = 1250. The horizontal and vertical scales 
are identical for both plots, but the smaller one {i = 1250) was horizontally shifted so that 
identical values of j are vertically aligned. Examination of Fig. 10 indicates that all curves 
in the main plot and some curves in the smaller plot are composed of two segments instead 
of a single straight line, and that the values of j where the two segments cross approximately 
coincide, for each temperature, with the boundary between the double-stranded and open 
portions of the sequence. Moreover, local slopes are much smaller whenever i and/or j lie in 
the open portion of the sequence. By comparing the two plots in Fig. 10, one finally notices 
that absolute values of ln{Cij) are different for different values of z, but that their variations 
are identical for identical values of j. These two observations suggest that for inhomogeneous 
sequences the two-point spatial correlation function Cij still evolves exponentially with |«— j|, 
as in Eq. (125!) . but that there exists one different correlation length ^ for each region that 
melts independently from the rest of the sequence. Note that it is then quite appropriate 
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to call these regions coherence regions. At last, we checked that the correlation lengths 
obtained from the slopes of the first segments in the main plot of Fig. 10 evolve as t~^'^^ 
(Tc = 353.9 K, as in the bottom plot of Fig. 8). Therefore, the correlation length critical 
exponent for the portion of the sequence with n < 1150 is again close to the above mentioned 
value 1/ = 1.23 for homogeneous sequences [l^ . 



V. CONCLUSION 



In this work, we analyzed the statistical physics of inhomogeneous DNA sequences close to 
denaturation. Unlike previous studies, which considered disorder-averaged thermodynamic 
observables, we focused on the successive local openings of precise sequences. To this end, 
we used the extended TI method of Zhang et al [l^ to investigate the properties of the 
heterogeneous DPB model [8|, and derived a modified version of this method to adapt it 
to the study of the JB model [lO, 12, 15|. Examination of the critical behaviour of the 



specific heat per particle, cy, the average bubble depths, (?/„), and the correlation length, 
^, leads to the following conclusions. Both models agree in pointing out that the principal 
effect of heterogeneity is to let different portions of the sequence open at slightly different 
temperatures. Besides this global effect, the dynamics of the local aperture of each portion is 
indeed very similar to that of a homogeneous sequence with the same length. In particular, 
the local melting transition of each portion is rounded by finite size effects 15|. Strictly 
speaking, one should therefore not describe the melting of an inhomogeneous sequence as a 
succession of phase transitions. When speaking more loosely, such a description is however 
not really wrong, in the sense that the melting of several hundreds or a few thousands 
of base pairs is accompanied by a sharp maximum of th e sp ecific heat and a clear step 
of the entropy (see Fig. 6 and Figs. 2 and 3 of Ref. 15|). The answer to the more 
involved question concerning the possibility to ascribe an order to these rounded transitions 
unfortunately turns out to depend on the model which is used to describe DNA. Indeed, for 
the JB model, sequences (or portions thereof) with several hundreds to a few thousands base 
pairs are already rather close to the thermodynamic limit (see the bottom plot of Fig. 7 and 
Figs. 3 and 4 of Ref. {isl), so that power laws are observed over significant temperature 
intervals. For the 2399 bp inhibitor and the 1793 bp actin sequences, the values of the 
critical exponents estimated on these temperature intervals turn out to be close to those of 
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homogeneous sequences at the thermodynamic hmit. In particular the specific heat critical 
exponent we obtained for the opening of the 1800 base pairs with n > 600 of the inhibitor 
sequence, a = 1.07, is characteristic of a first order phase transition. Of course, it is not 
possible to draw a general conclusion from a single example, but this calculation still has 
the merit of showing that disorder does not necessarily reduce the order of the transition. 
In contrast, for the DPB model, sequences with a few thousands base pairs are still quite 



far from the thermodynamic limit (see the top plot of Fig. 7 and Fig. 3 of Ref. [iSj), so 



that it is not appropriate to discuss the order of the melting transition for inhomogeneous 
sequences described by this model. 

Last but not least, it should be emphasized that the two Morse parameters Dn for AT 
and GC pairing and the ten stacking enthalpies AHn cannot be extracted independently 



from experimental denaturation curves 
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33|. It has however been shown recently how 
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33|. The 



these twelve quantities can be obtained from the properties of nicked DNA 
free energies reported in Table 1 of Ref. indicate that heterogeneity in improved dy- 
namical models of DNA secondary structure should be carried by both pairing and stacking 
energies. It will therefore be very instructive to build a dynamical model centred on these 
data and check whether the description of the melting phase transition of inhomogeneous 
DNA obtained from this model matches that obtained with the DPB or the JB models (note 



that the new parameters have already been used in statistical models, see [3J]). Aside from 
the adjustment of the remaining free parameters of the model against experimental melting 
curves, the major difficulty of this task will consist in establishing a TI calculation procedure 
that allows to take into account the heterogeneity of both pairing and stacking energies. 
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FIGURE CAPTIONS 



Figure 1 : (color online) : (Top) plot, for increasing temperatures, of {yn) as a function 
of the site number n for the 1793 bp human /5-actin cDNA sequence (NCB entry code 
NM_001101). These curves were obtained from TI calculations performed with the JB 
model. (Bottom) plot, as a function of n, of the AT percentage averaged over 40 consecutive 
bp of the actin sequence. 

Figure 2 : (color online) (Top) plot, for increasing temperatures, of (?/„) as a function 
of the site number n for the 2399 bp inhibitor of the hepatocyte growth factor activator 



sequence 30|]. These curves were obtained from TI calculations performed with the JB 
model. (Bottom) plot, as a function of n, of the AT percentage averaged over 40 consecutive 
bp of the inhibitor sequence. 

Figure 3 : (color online) Comparison of (?/„) profiles for the 1793 bp actin sequence at 
322 K (bottom plot) and 346 K (top plot) obtained from TI calculations (dashed lines) and 
MD simulations (solid lines) performed with the JB model. The main plots show the profile 
of the whole sequence, while the inserts zoom in on 300 base pairs. 

Figure 4 : (color online) Comparison of (?/„) profiles for the 1793 bp actin sequence at 350 
K obtained from TI calculations (small crosses) and MD simulations (solid line) performed 
with the heterogeneous DPB model. The main plot shows the profile of the whole sequence, 
while the insert zooms in on 300 base pairs. 

Figure 5 : (color online) Plot of the fraction of open base pairs as a function of tem- 
perature T for the 1793 bp actin sequence, obtained from TI calculations performed with 
the JB model. The criterion for a base pair n to be open is that (?/„) be larger than the 
threshold of 10 A. 

Figure 6 : (color online) Plots of the specific heat per particle cy as a function of 
temperature T for the 1793 bp actin sequence (top plot) and the 2399 bp inhibitor sequence 
(bottom plot), obtained from TI calculations performed with the JB model, cy is expressed 
in units of the Boltzmann constant kB- 
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Figure 7 : (color online) Log-Log plots of the specific heat per particle cy as a function 
of the reduced temperature t for the 2399 bp inhibitor sequence (solid lines), a 2000 bp 
homogeneous sequence (dashed lines), and an infinitely long homogeneous sequence (dot- 
dashed lines), obtained from Tl calculations performed with the JB model (bottom plot) 
and the DPB model (top plot), cy is expressed in units of the Boltzmann constant ks- 

Figure 8 : (color online) Log-Log plots, as a function of the reduced temperature t, of 
the average depth of bubbles centred around n — 1300, n — 1450 and n = 1640 (top 
plot), and n — 318 and n — 441 (bottom plot) for the 1793 bp actin sequence. These results 
were obtained from TI calculations performed with the JB model. The critical temperature 
of each portion of the sequence is indicated on the corresponding plot. 

Figure 9 : (color online) Plots of ln{Cij) as a function of \i — j\ for a homogeneous 
sequence with 10000 base pairs at several temperatures comprised between 340 K and 367.2 

K. i = 1 for all the plots. These results were obtained from Tl calculations performed with 
the homogeneous JB model. Critical temperature of this system is Tc = 367.47 K. 

Figure 10 : (color online) Plots of ln{Cij) as a function of |^ — j | for the 1793 bp actin 
sequence at several temperatures regularly spaced between 340 K and 350 K. i — 180 for 
the main plot and i — 1250 for the smaller vignette. The horizontal and vertical scales are 
identical for both plots, but the vignette (i = 1250) was horizontally shifted so that identical 
values of j are vertically ahgned.These results were obtained from TI calculations performed 
with the JB model. 
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